Variational Monte Carlo with the Multi- Scale Entanglement Renormalization Ansatz 
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Monte Carlo sampling techniques have been proposed as a strategy to reduce the computational cost of 
contractions in tensor network approaches to solving many-body systems. Here we put forward a variational 
Monte Carlo approach for the multi-scale entanglement renormalization ansatz (MERA), which is a unitary 
tensor network. Two major adjustments are required compared to previous proposals with non-unitary tensor 
networks. First, instead of sampling over configurations of the original lattice, made of L sites, we sample over 
configurations of an effective lattice, which is made of just log(L) sites. Second, the optimization of unitary 
tensors must account for their unitary character while being robust to statistical noise, which we accomplish 
with a modified steepest descent method within the set of unitary tensors. We demonstrate the performance of 
the variational Monte Carlo MERA approach in the relatively simple context of a finite quantum spin chain at 
criticality, and discuss future, more challenging applications, including two dimensional systems. 

PACS numbers: 05.10.-a, 02.50.Ng, 03.67 .-a, 74.40.Kb 
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I. INTRODUCTION 

Understanding the collective behaviour of quantum many- 
body systems remains a central topic in modern physics, as 
well as one of the greatest computational challenges in sci- 
ence. Quantum Monte Carlo sampling techniques are capa- 
ble of addressing a large class of (unfrustrated) bosonic and 
spin lattice models, but fail when applied to other models such 
as frustrated antiferromagnets and interacting fermions due to 
the so-called sign problem. Variational approaches, on the 
other hand, are sign-problem free but are typically strongly 
biased towards specific many-body wavefunctions. An im- 
portant exception is given by the density matrix renormaliza- 
tion group^ a variational approach based on the matrix prod- 
uct state (MPS)^ which is capable of providing an extremely 
accurate approximation to the ground state of most one- 
dimensional lattice models. The success of DMRG is based 
on the fact that an MPS can reproduce the structure of entan- 
glement common to most ground states of one-dimensional 
lattice models. 

In order to extend the success of DMRG to other con- 
texts, new tensor networks generalizing the MPS have been 
proposed. For instance, the multi-scale entanglement renor- 
malization ansatz (MERA)^ with a network of tensors that 
extends in an additional direction corresponding to length 
scales, is particularly suited to address quantum critical sys- 
tems. Most significant has also been the proposal of tensor 
networks for systems in two and higher dimensions, where the 
MPS becomes inefficient. Scalable tensor networks include 
the projected entangled-pair states PEPS 4 (a direct generaliza- 
tion of the MPS to larger dimensions) and higher dimensional 
versions of the MERAi^ They can be used to address frus- 
trated antiferromagnets and interacting fermions, since they 
are free of the sign problem experienced by quantum Monte 
Carlo approaches. 

In a tensor network state, the size of the tensors is mea- 
sured by the bond dimension %. This bond dimension x indi- 
cates how many variational coefficients are used. Crucially, it 



also regulates both the cost of the simulation, which scales 
as 0(x p ) for some large power p, and how much ground 
state entanglement the many-body ansatz can reproduce. In 
the large x regime, PEPS and MERA are essentially unbi- 
ased methods, but with a huge computational cost that is of- 
ten unaffordable. More affordable simulations are obtained 
in the small x regime, but there these methods are biased in 
favour of weakly entangled phases (e.g. symmetry-breaking 
phases) and against strongly entangled phases (e.g. spin liq- 
uids and systems with a Fermi surface). Identifying more ef- 
ficient strategies for tensor network contraction, so that larger 
values of the bond dimension x can be used and the bias to- 
wards weakly entangled states is suppressed, is therefore a 
priority in this research area. 

Refs. proposed the use of Monte Carlo sampling as a 
means to decrease computational costs in tensor network al- 
gorithms. [We note that there are other variational ansatze, 
such as so-called correlated product states, entangled plaque- 
tte states, and string-bond states, whose contractibility relies 
on sampling; see the introduction of Ref. |^for a review]. In a 
tensor network approach such as MPS, MERA or PEPS, sam- 
pling over specific configurations of the lattice allows to re- 
duce the cost of contractions (for single samples) from 0(\ p ) 
to 0(x 9 ), where q is significantly smaller than p, typically of 
the order of p/2. Needless to say, sampling introduces sta- 
tistical errors. However, if less than 0(x 9 ~ p ) samples are 
required in order to achieved some pre-established accuracy, 
then overall sampling results in a reduction of computational 
costs. 

The proposal of Refs. 0His based on computing the over- 
lap of the tensor network state with a product state (represent- 
ing the sampled configuration). As such, it cannot be directly 
applied to the MERA, because the overlap of a MERA with 
a product state cannot be computed efficiently. Luckily, as 
discussed in Ref. Jfjj, a sampling strategy specific to unitary 
tensor networks (such as MERA and unitary versions of MPS 
and tree tensor networks) is not only possible, but it actually 
has several advantages. Most notably, sampling takes place 
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FIG. 1 : (Color online) Tensor network diagram for a binary MERA 
presenting the state \9) of a translation invariant lattice C made of 
L = 12 sites and with periodic boundary conditions. The tensors on 
each layer are identical and their labels are displayed to the left. The 
U^ n ' tensors (green rectangles) are unitary operators (acting top-to- 
bottom), W^ n ' (cyan triangles) are isometric, and ip (red circle) is a 
normalized 'wavefunction'. 



over configurations of a reduced, effective lattice; and it is 
possible to perform perfect sampling, by means of which un- 
corrected configurations are drawn directly according to the 
correct probability. 

The main goals of this paper are to propose a variational 
Monte Carlo scheme for the MERA and to demonstrate its 
feasibility. We also discuss possible future applications. Let 
us briefly list some of the highlights of the approach, (i) In 
a lattice of size L, the sampled configurations correspond to 
an effective lattice of size C(log(L)); in this way, the cost of 
evaluating the expectation value of a local observable scales 
just as C(log(L)) and not as <D(L) as in Refs. SJS (ii) 
We employ the perfect sampling strategy of Ref. LKj thus 
avoiding the loses of efficiency in the Markov chain Monte 
Carlo of Refs. 0, M due to equilibration and autocorrelation 
times, (iii) Variational parameters are optimized while explic- 
itly preserving the unitary constraints that the tensors in the 
MERA are subjected to. This is accomplished by a steep- 
est descent method within the set of unitary tensors, which is 
much more robust to statistical noise than the singular value 
decomposition methods employed in MERA algorithms with- 
out sampling^ 

We demonstrate the performance of our approach by com- 
puting an approximation to the ground state of a finite Ising 
chain with transverse magnetic field. For the binary MERA 
under consideration, sampling lowers the costs of elementary 
contractions from 0(x 9 ) to 0(x 5 )- We find that the result- 
ing (approximate) ground state energy decreases as the num- 
ber of samples is increased, thus obtaining a demonstration of 
principle of the approach. We also notice that the number of 
samples required to achieve a given accuracy increases as the 
transverse magnetic field approaches its critical value. 

To our knowledge, this is the first instance of sampling- 
based optimization of a relatively complex tensor network. 
Previous similar optimizations included that of an MPS, 8 
which is a considerably simpler tensor network (with only 
three-legged tensors), and of tensor networks that under sam- 
pling break into smaller, simpler tensor networks (e.g. into 
MPS, single plaquette states, etc)i 7 ' 9 i 12 i 13 In more complex 
tensor networks, such as MERA and PEPS, the optimiza- 
tion becomes much harder due to high sensitivity to statisti- 



cal noise. Thus, for instance, Ref. HU spells out a full varia- 
tional Monte Carlo approach for PEPS but uses an alternative 
method, not based on sampling, in order to optimize the ten- 
sors. Indeed, in Ref. H3I sampling is only used to aid in the 
computation of expectation values. Here, instead, we use sam- 
pling both to optimize the MERA and to compute expectation 
values. 

We emphasize, however, that our results only demonstrate 
a gain over optimization schemes based on exact contractions 
(i.e. without sampling) in the low accuracy regime, where 
only a relatively small number of samples are required. The 
specific MERA (namely binary MERA for a one-dimensional 
lattice) and low value of x (x = 4) considered here for illus- 
trative purposes implies that the cost per sample is x 9 /x 5 = 
X 4 = 256 times smaller than an exact contraction. Recall 
that the statistical error decreases only as ^/N with the num- 
ber N of samples. If more than 256 samples are required in 
order to obtain a sufficiently accurate approximation of the ex- 
act contraction, then the sampling scheme may be overall less 
efficient than the exact contraction scheme. 

The advantage of sampling over exact contraction schemes 
is expected to be more evident in MERA settings where the 
cost 0(x p ) scales with a larger exponent p, and for larger val- 
ues of x- In particular, we envisage that the method described 
in this paper, possibly with further improvements, will im- 
prove the range of applicability of MERA in two and higher 
dimensions. 

The content of this paper is distributed in five more sec- 
tions. In Sec. HU we discuss methods for sampling with the 
MERA. In Sec. [HI] we propose an optimization scheme us- 
ing sampling techniques. In Sec. [lV]we benchmark the ap- 
proach with the quantum Ising model. In Sec. [V] we discuss 
future applications including extensions to higher dimensions 
and extracting long-range correlations, before concluding in 
Sec. EH 



II. LOCAL EXPECTATION VALUES WITH SAMPLING 

In this section we explain how to use sampling in order 
to speed-up the computation of expectation values with the 
MERA. We present both complete and incomplete perf ect 
sampling strategies, building on the proposals of Ref. [Kj for 
generic unitary tensor networks. We also discuss the impor- 
tance of the choice of local basis in sampling. We start by re- 
viewing some necessary background material on the MERA. 



A. MERA, expectation values and causal cones 

The MERAi is a variational wavefunction for ground states 
of quantum many -body systems on a lattice. The state 
of a lattice £ made of L sites is represented by means of a 
tensor network made of two types of tensors, called disentan- 
glers and isometries. The tensor network is based on a real- 
space renormalization group transformation, known as entan- 
glement renormalization: disentanglers are used to remove 





FIG. 2: (Color online) Expectation value = (* c |yi|* c ) of 

the local operator A (yellow rectangle), which acts on (at most) three 
neighboring sites. The flipped tensors, on the bottom half of the di- 
agram, are the Hermitian conjugates of the respective tensors above. 
The 'causal cone' is delimited by the dotted line in the left diagram, 
corresponding to (3 f |j4|5'). The tensors outside the causal cone can- 
cel, significantly simplifying the diagram on the right, corresponding 
to (* c |i|* c ). 

short-range entanglement from the system, whereas isome- 
tries are used to coarse-grain blocks of site into single, ef- 
fective sites. An example of a MERA on a periodic ID lattice 
with L = 12 sites is depicted in Fig.Q] This structure is called 
'binary' MERA because of the 2-to-l course-graining trans- 
formation in each repeating layer. Ascending upwards in the 
figure, the disentanglers £A"' remove short-range entangle- 
ment in between each course-graining transformation, imple- 
mented by isometries until the remaining Hilbert space 
is small enough to deal with directly with some wavefunction 
(p. 

The MERA can also be viewed in the reverse — starting 
from the top of Fig. Q] we descend downwards in a unitary 
quantum circuit, adding (initially unentangled) sites in each 
layer. For instance, let us flow downwards in Fig. Q] To a 
three-site system in state ip, we first add three additional un- 
entangled sites, turning it into a six-site system; and later we 
add another six unentangled sites, producing the final twelve- 
site system. This unitary structure can be exploited when cal- 
culating the expectation value (A) = (^f\A\^) of a local op- 
erator A acting on a few neighboring sites. Specifically, all 
tensors not 'causally' connected to the few sites supporting A 
cancel, as depicted in Fig. [2] The resulting diagram is signifi- 
cantly simpler and can be interpreted as the expectation value 
(^ C \A\^ C ) of A for a state |* c ) of an effective lattice C c 
made of 0(\og(L)) sites (see Fig.0. We emphasize that, by 
construction, 

= (* c |i|* c ). (1) 

Therefore, we can evaluate the expectation value (^ , |y4|v]>) by 
contracting the tensor network corresponding to (\& |A|^ C ). 
The numerical cost of performing this contraction grows lin- 
early with the number of sites in C c , and thus only logarith- 
mically with the number of site L in the original lattice C. 

The dimension of the Hilbert space after each course- 
graining transformation is an adjustable parameter, the bond 
dimension x> which plays a central role in the present dis- 



FIG. 3: (Color online) States \^) and |'3' c ) represented by the 
MERA. (a) Tensor network for the state \9) of the original lattice 
C The causal cone is delimited by a discontinuous line, (b) Tensor 
network for the state | 1 5 c } of the effective lattice C c . Sites further 
from the center effectively represent increasingly large length scales 
in the original lattice. 

cussion. Increasing the bond dimension x implies including a 
larger fraction of the original Hilbert space and leads to greater 
accuracy, but also requires greater computational resources. 
Optimization algorithms to approximate ground states, and to 
evaluate local expectation values and correlators, are present 
in the literatureJi The numerical cost of finding an expecta- 
tion value or performing a single optimization iteration using 
the binary MERA scales as 0(x 9 logL) for a translation in- 
variant system. For more complex MERA structures, such as 
those representing 2D lattices, the power of x for the cost in- 
creases dramatically. For instance, the 2D MERA presented 
in Ref. [Unas a numerical cost of 0(x 16 log L), which on cur- 
rent computers restricts x < 8. For many systems, this does 
not allow for enough entanglement to accurately describe the 
ground state, limiting the accuracy of the approach. 

Here we hope to alleviate this problem by reducing the nu- 
merical cost as a function of x using Monte Carlo techniques. 
We will find that the cost of a single sample scales as 0(x 5 ) 
for binary ID MERA, compared to the C(x 9 ) cost of the 'ex- 
act' contraction. 



B. Monte Carlo sampling with the MERA 

Our goal is to compute the expectation value (A) = 
(^\A\^>) by contracting the tensor network corresponding to 
(^ c | ^4 1 M> c ) , see Fig. [2] The first step is to re-express the ten- 
sor network contraction as a summation over indices corre- 
sponding to the sites of the effective lattice, as shown in Fig.|4] 
We then get 

(A) = J2^ C \r)(r\A\^ c ), (2) 
yen 

where 7Z is an orthonormal basis of product states |r) = |ri)® 
\r?) . . . on the effective lattice C c . 

We will approximately evaluate the sum in Eq. © by using 
Monte Carlo sampling over the states |r) E 1Z. Notice that 
in the effective lattice, the sites away from the support of A 
have undergone one or more course-graining transformations. 
In other words, sites further from the center represent increas- 
ingly larger length-scales. Thus, sampling over sites of the 
effective lattice corresponds to sampling the system at differ- 
ent length scales. This property is reminiscent of global or 
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FIG. 4: (Color online) Tensor network for (* C |A|* C ). The dashed 
lines indicate the indices that will be sampled. On the right-hand- 
side we explicitly write the tensor contractions outside the causal 
cone as a sum over a complete, orthonormal set of 'wavef unctions' 
|r) = |n) ® |ra) 55 . . . (pink circles). Monte Carlo sampling will 
be performed over this set. Each term of the sum can be expressed 
as the product (^ c \r) (r\A\^ c ) . 



cluster updates used in existing Monte Carlo methods to solve 
critical systems. 

A naive scheme for approximating the sum in Eq. <j2j> 
would be to choose |r) at random from TZ, and evaluate 
(^ c |r) (r| A|\E fC ) according to the tensor networks in Fig. |4] 
The cost of obtaining a single sample scales as C(% 5 logi). 
However, the statistical variance of a sampling scheme can 
be substantially reduced by implementing importance sam- 
pling — in this case choosing configurations |r) that are more 
likely. More precisely, sampling is implemented according to 
the wavefunction weight, P(r) = |(^' c |r)| 2 , which can be 
calculated efficiently as indicated in Fig. [5] We can express 
Eq. (O in a form more convenient for importance sampling, 



(A) = J2P(r)A C (r), 



where 



A L 



(^ c |r)(r|i|^ c ; 
(* c |r)(r|# c ) 



(r|i|fr c ; 
(r|* c ) 



(3) 



(4) 



Note that because the MERA is normalized by construc- 
tion, and therefore the weights sum to one, J^ren ^( r ) = 
However, during sampling only some subset TZ of the configu- 
rations are considered. One needs to renormalize the weights 
accordingly, so that the expectation value (A) is approximated 



(i)«^P(r)A c (r)/£p(r) 



(5) 



r£lZ 



1. Complete perfect sampling 

In the case of MERA, and indeed any state that can be writ- 
ten as a unitary quantum circuit, a 'perfect' sample can be 



generated according to the probability distribution P(r) in a 
single sweepJS This makes Markov Chain Monte Carlo un- 
necessary, simplifying the algorithm and eliminating a source 
of statistical error (i.e. autocorrelation effects). This is one ad- 
vantage of this technique over other tensor network sampling 
methods in the literature^ " 9 ' 14 

The sample can be constructed by sampling just one index 
at a time. Beginning at the top layer of the MERA, and aiming 
to sample just the first index (left-most in Fig. |4](b)), we can 
construct the one-site reduced density matrix p\ by the tensor 
contraction in Fig. |5](b). The probability P(j"i) = (ri |pi |ri) 
can then be found for all possible \r\). A value of \r\) is 
then randomly selected according to any complete basis of 
our choosing. 

After this selection is made, we can then sample the 
next (top-most) index, according to the conditional weights 
P( r 2 | p) as calculated by the diagram in Fig. 0(b) (we refer 
to Ref. 10! for further details). We continue to sample the state 
of each site, until we have sampled every site. 

Each of the diagrams in Fig.|5]can be calculated with cost 
C(x 5 ), while there are O(logL) layers to the MERA. A sin- 
gle sample can therefore be generated with cost C(x 5 logL), 
compared to 0(\ 9 log L) for the exact contraction. So long as 
the number of samples N is significantly less than \ A , Monte 
Carlo sampling will be faster than exact contraction. 

In practice, we will perform A 1 samples in order to 
get a good estimate of (A). As the samples are completely 
uncorrelated, the variance of A c (r) 



Ya.r[A c ] = ^ P(r) (A c {r) - A c ) 



can be used to estimate statistical error AA, 



AA = JVar[A c ]/N, 



(6) 



(7) 



while Var [ A c ~\ is itself upper bounded by the variance of the 
operator to be measured, 



Vav[A c ] < (A 2 ) - (A)^ 



(8) 



It is easy to show that the upper bound is saturated when sam- 
pling in the diagonal basis of A. However, the actual value 
of this variance is dependent on the choice of basis {|r}}, as 
well as the state being sampled. Therefore, it is worth spend- 
ing some effort in order to minimize this quantity. 



2. Incomplete perfect sampling 

The statistical noise is caused by Monte Carlo sampling of 
the tensor contraction, and it stands to reason that sampling 
less indices in the tensor network diagram (see Fig. SJ would 
reduce the statistical error. Indeed, it is possible to exactly 
contract the three physical indices at the lowest level (those 
connected to operator A) while keeping the computational 
cost at C(x 5 logL), as depicted in Fig. [6] That is, we sam- 
ple over configurations r* € TZ° of the effective lattice C c 
minus the three central sites on which A is supported. 
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OR (d) 
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(e) 



(f) 



<Pi-i 





FIG. 5: (Color online) Diagrams required to create a sample, sequentially sampling indices from the top to the bottom of the MERA. First 
the reduced density matrix, pi (blue circles), of an index is computed. We then randomly generate a state |rj) according to the diagonal basis 
of pi, and proceed to calculate Pi+i. In each 'layer' of the MERA, we sample the outermost indices in (a,b) and one of the inner indices in 
(c,d), depending on the physical location we are sampling. A projected, three-site wavefunction for the following layer is calculated in the 
appropriate diagram (e) or (f), from which the algorithm begins again with (a), until we have sampled all the indices indicated in Fig.|4]or|6] 



This method effectively generates (unnormalized) three- 
site wave functions |</Jo(r )) = (r°|\l/ c ) from the reduced 
density operator (as seen in Fig. [5] (e,f)). Conversely, 
the reduced density matrix for the three central sites is 
2r° l ( / 5 o( r °))('Po( r<> )|- Importance sampling is now achieved 
by selecting r° according to the weight 

>><r^ c > = <^(OMO>. 



(a) (* c |r°}(r°|# c ) 



(b) (* c |r*)^(r°|* c ) = 



P(r«) = 



(9) 



In turn, perfect sampling of the sites proceeds exactly as be- 
fore, but it stops before the three central sites, which are not 
sampled. We define the estimator 



(* c |r°)(r*|1' c ) 
whose expectation value obeys, 

(A) = P{t*)A*(t 



(y»o(rO)|i|y» (rO)) 
(^ (r«)Mr<>)) : 



r°eTl c 



in analogy with Eq. (0. Notice that, by construction, the sta- 
tistical variance Var[A°] is smaller or equal than Var[ J 4 c ] 
in Eq. ([S]), and therefore the numerical accuracy is increased 
without affecting the computational cost. 



3. Diagonal basis selection 

In general, we are free to choose any complete basis 1Z (or 
72.°) from which to draw individual samples. A good choice 
is one that produces a small statistical error, Eqs. ( 17181 ). In- 
tuitively, the goal of importance sampling is to decrease the 
statistical variance by choosing configurations r (or r°) with a 
large overlap with the state \ ^ c ). With this in mind, one could 
aim to maximize the 'average' weight, 



(12) 




(10) 



(11) FIG. 6: (Color online) Tensor network diagrams for the quantities 
(a) (* c |r )(r <> |* c ) = P(r) and (b) (* c |r°)i(r°|* c ) within- 



complete sampling. 



It is easy to show that for a given quantum probability dis- 
tribution, specified by a density matrix, the above quantity is 
maximized in the diagonal basis of the density matrix. In- 
spired by this fact, here we choose to sample site i in the basis 
in which the reduced density matrix pi is diagonal. The \ x X 
density matrices pi, calculated in Fig. [5] can be diagonalized 
with cost 0(x 3 )- Note that the chosen basis will depend on 
previously sampled sites, and that the resulting sampling basis 
is still a complete, orthonormal basis of product states. 

We have found that this approach can radically increase the 
average value of the weight, Eq. ( fT2l ) with the effect becom- 
ing stronger for larger systems and values of \- More impor- 
tantly, we find that the statistical variance in the observables 
is decreased (see Sec. IIV Al and Fig. [8]). 

This technique to select the sampling basis is not specific 
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to unitary tensor networks nor perfect sampling methods, and 
could thus be of benefit to other variational quantum Monte 
Carlo algorithms. 



4. Operators that decompose as sum of local terms 

Finally, we may wish to compute the expectation value of 
an operator that is the sum of local terms, such as a Hamilto- 
nian made of nearest-neighbor interactions: 



(13) 



In this case we sample each local term Hi as indicated pre- 
viously, noticing that the causal cone of each Hi depends on 
the location of the sites of lattice C where the local operator 
is supported. One can either choose to (uniformly) sample the 
position i in the lattice, or systematically sweep through all 
the positions i. A complete sweep, where each site i is visited 
once, costs 0(\ 5 LlogL). 



III. ENERGY MINIMIZATION WITH SAMPLING 

In order to find an approximation to the ground state, we 
need to minimize the energy of the MERA. The direction of 
steepest ascent is given by the complex derivative with respect 
to the conjugate 16 of each element of each tensor. Inserting 
Eq. © (or Eq. ( fTUl i) into Eq. (0 and differentiating gives 



d(H) 

dlf(n)* 

d(Hj) 
QU(n)* 



E 



(14) 



1 d(fr c |r)(r|g t |^ c ) 
P(r) BUM* 

(Hj) d(^ c \r)(r\^ c ) 
P(r) dU( n > 



where the derivative with respect to a tensor is element-wise, 
and similar expressions hold for W^ n > and ip. The deriva- 
tives on the right-hand-side of Eq. (T5[ can be found by using 
the usual rules for calculating derivatives in the diagrams, see 



will be estimated 



Figs. [6]and[7p2 In practice (Hi) and d 
simultaneously by sampling. 

Multiple approaches are possible for updating the MERA 
to minimize the energy and find a good approximation to 
the ground state. One approach often used iteratively opti- 
mizes each tensor in the MERA according to the following 
algorithmic. 



usv 



d(H) 

QJj(n)* 



— > uv 



(16) 



In the above u and v are unitary, while s is diagonal and 
positive, thus representing the singular value decomposition 



(SVD) of the derivative. This algorithm finds the unitary ten- 
sor that minimizes the trace value of its product with 
the above, called environment — with the requirement that H 
is negative-semidefinite. Similar steps apply to W^ n ' and ip, 
where the SVD ensures that they remain isometric or normal- 
ized, respectively. 

Unfortunately, the above scheme is extremely sensitive to 
the statistical noise inherent to Monte Carlo sampling, and 
results in very poor optimization. Ideally, we would prefer 
a method in which the statistical noise is able to average out 
over many iterations. 

The most obvious scheme satisfying this requirement is 
straightforward steepest descent. Again, one must ensure that 
the tensors obey the unitary /isometric constraints characteris- 
tic of the MERA, so one can utilize the SVD to find the uni- 
tary tensor closest (with respect to the L2-norm) to the usual 
downhill update. With this method, the ith step is given by 



usv = U (n) 
-> uv, 



d(H) 



(17) 



where pt t is a number modulating the size of change at the step 
t. 

In this paper we avoid using the SVD entirely by explicitly 

remaining in the unitary subspace, along the lines of Ref. un . 

(n) 

We define the tangent vector Gy as the derivative projected 
onto the tangent space of all unitaries located about U^ n \ 



G 



(n) 



9(H) 

QUin)* 



jtM d(H) {n) 

dU (n)* U ■ 



(18) 



(25) The matrix U^—fitG^ is within O(fj^) of a unitary matrix. 



Noting that U^^Gy is anti-Hermitian, then the update 



[/(") jj( n ) 



exp 



[/(")t 



d(H) 
gjj(n)* 



QUin)* 



(19) 

both travels in the direction of the tangent vector while U^ n > 
remains precisely unitary. The same approach works for nxm 
isometric matrices, taking care that n < m, with computa- 
tional cost scaling similarly to the SVD approach as 0(n 2 m) 
(see Appendix). 

The performance of the algorithm is highly dependent on 
the behaviour of \it, as well as the number of Monte Carlo 
samples, N t , taken in each step. Simple schemes will keep 
fit and N t constant, which is the approach we take here. On 
the other hand, one may choose to increase Nt with t so that 
harmful noise is reduced when approaching the optimal solu- 
tion; or to decrease /i t with t for much the same reason; or a 
combination of both. 
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FIG. 7: (Color online) Examples of tensor network diagrams to find the derivatives of (ty c \r <> )A(r <> \ty c ) and (^ c |r°)(r°|5 ,c ) with respect to 
the tensor C/' 2 ^ (i.e. the transpose of those in Eq. l !15b ). The derivative with respect to a particular tensor is given by it's environment, or the 
contraction of all the other tensors in the diagram. The chain rule produces multiple terms where a tensor appears more than once. 



IV. BENCHMARK: CRITICAL QUANTUM ISING CHAIN 

In this section we demonstrate the above techniques with 
the well-known transverse-field quantum Ising model, 

# = -E fi * 5 *+i +w i- (20) 

i 

Such a Hamiltonian can be expressed as a sum of nearest- 
neighbour terms Hi, such that 

Hi = -a?a! +1 - h/2(af + af +1 ). (21) 

We will pay particular attention to the region around the crit- 
ical point at h = 1, which is the most demanding computa- 
tionally. For concreteness, we use a three-layer binary MERA 
with periodic boundary conditions, resulting in a lattice of 24 
sites in the bottom of the MERA structure. However, each of 
these sites corresponds to a block of 3 physical spins, mak- 
ing a total of 72 spins. We choose this blocking so that for 
X < 2 3 = 8 the bond dimension only ever decreases when 
ascending through the MERA. In what follows, we employ 
incomplete perfect sampling where three sites at the bottom 
of the MERA are contracted exactly. 

A. Expectation values 

We now analyze the effectiveness of extracting expectation 
values from the MERA using Monte Carlo sampling. For per- 
fect sampling techniques, the accuracy can be easily extracted 
from the variance using Eq. 0. The scaling of the error in 
the energy, AE oc N" 1 ' 2 , is shown explicitly for the critical 
{h = 1) system in Fig. [8] (a). The variance of the energy es- 
timator E as a function of h is shown in Fig. 0(b). Here we 
have used MERA wavefunctions previously optimized using 
standard techniques^ without Monte Carlo sampling, that is, 
sampling is only employed to extract the expectation values. 



Notice that the variance is maximal near the critical point 
at h = 1. As the Monte Carlo code effectively samples wave- 
functions from the reduced three-site (i.e. nine-spin) density 
operator, one would expect the energy variance to increase 
with the amount of entanglement in the system. For reference 
we have included the entropy of the three-site density matrix 
p(3) m Fig.[8](c). This entropy mostly^ corresponds to the 
entanglement entropy of three sites with the remainder of the 
system. We see a strong correlation between the amount of 
entanglement and the size of the variance of the energy esti- 
mator E. 

Let us emphasize that Fig.[8](b) shows that our scheme per- 
forms significantly better than directly sampling p^ in either 
the x or z spin basis. The measured variances are very similar 
to a diagonal sampling of p^ (i.e. in its diagonal basis). This 
indicates that the sampling scheme is performing as intended 
inSec. lHBll 

In general, an accurate representation of wavefunctions 
with greater amounts of entanglement will require greater 
bond dimension \- These results suggest that the statistical 
variance generated by this scheme will also increase with the 
entanglement. To achieve a certain precision in the expecta- 
tion value of local observables, the number of required sam- 
ples grows with this variance, and thus with the amount of en- 
tanglement and with the minimum suitable value of %. There- 
fore, although a single sample has cost C(% 5 ), the total cost 
to obtain a certain precision may have some additional de- 
pendence on x- Nevertheless, no additional dependence was 
clearly manifest in our simulations at fixed h. 



B. Optimization 

Finally, we combine Monte Carlo sampling with our 
unitary-subspace steepest descent algorithm to obtain opti- 
mized wavefunctions. In Fig. [9] we plot the energy of the 
MERA during the optimization process at h = 1 and \ = 4, 
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FIG. 8: (Color online) (a) The statistical error of the energy per site 
at h — 1 as a function of the number of samples TV follows the 
classic TV~ 1//2 scaling, (b) Variance of the energy estimator (nor- 
malized to the expectation value (H)) for optimized MERAs with 
bond-dimension \ = 8 for various values of h (red crosses). Esti- 
mates of the statistical uncertainty are smaller than the symbols. The 
grey area is eliminated by inequality Eq. {8]l, bounded by the vari- 
ance of H. For comparison, we include the variances expected from 
several hypothetical samplings of p' 3 ' . Variances from sampling the 
spins in the z (or x) basis is indicated by a blue, dash-dot (or green, 
dashed) line. Our numerical results (red crosses) show remarkable 
similarity with sampling in the diagonal basis of p' 3 ' (black, solid 
line), (c) The entanglement entropy of p' 3 ' . In (b) and (c), the criti- 
cal point at h = 1 is indicated with a red dotted vertical line. 



where the simulation progresses through a range of different 
number of sweeps TV per optimization step. In all cases the 
step size is fixed at \i t = 0.1. We observe that increasing TV 
improves the quality of the optimized wavefunction and for 
large values of TV the simulation tends to converge towards the 
same energy obtained with exact contractions, as expected. 

Like all tensor network optimizations, care must be taken to 
ensure the wavefunction has fully converged to the lowest en- 
ergy state. For instance, in Fig.[9](a) we see a plateau in energy 




5000 10000 15000 

Iteration 



FIG. 9: (Color online) (a) Energy of a wavefunction with \ — 4 
during optimization. Each iteration is update using TV sweeps, where 
TV is 1, 2, 4, 8 in the black crosses, red pluses, green diamonds and 
blue points respectively. Every 100 iterations, we calculate the exact 
energy corresponding to the current wavefunction, which is plotted 
here. The solid horizontal line indicates the energy of an optimized 
X = 4 MERA using exact contractions and steepest descent (which 
remains ~ 7 x 10 -4 above the the true ground state energy, indi- 
cated by the dashed line). The simulation converges for large TV, 
but x ma Y need to increase for greater accuracy, (b) Difference to 
the above solid line plotted on a logarithmic scale. The difference 
reduces with increasing TV, and although statistical fluctuations are 
decreasing, they remain evident on the logarithmic scale. 



before around the 4000th iteration that could be mistaken for 
convergence (whereas the simulation is actually navigating a 
stiff region, i.e. a long narrow valley in the energy landscape). 
Non-deterministic features due to statistical fluctuations can 
also be seen — such as the sudden increase of energy of the 
N = 4 simulation around the 13000th iteration. 

Beyond this, accuracy could be improved by increasing \. 
It should be noted that we have observed that the steepest de- 
scent method (with either exact contractions or sampling) will 
not always produce wavefunctions of the same quality as the 
SVD method as it may be more susceptible to local minima 
or extreme stiffness. However, accuracy can still be systemat- 
ically improved by increasing \- 

In the previous section we noted that the statistical uncer- 
tainty peaked around the critical point at h = 1, where the 
entanglement is maximal, and one might expect the optimiza- 
tions to be most difficult around this point. Plotted in Fig.fTOl 
is the difference in energy between our wavefunctions and the 
exact, analytic solution for a range of h. We observe that the 
error decreases away from the critical point, and that there is 
a clear relationship between the quality of the wavefunction 
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FIG. 10: (Color online) The difference between the final optimized 
energy and the true ground state for a variety of field strengths h. 
The markers indicate the number of Monte Carlo sweeps taken be- 
tween updates, N — 1,4, 16, 64 for the black cross, red diamond, 
green circle and blue square, respectively. There is a clear trend for 
improved ground state energy as TV increases, and away from the 
critical point at h — 1 (vertical red dotted line). 



and the number of samples per step, TV. 

There are several possible limiting factors in variational 
Monte Carlo optimizations of MERA wavefunctions. One 
must balance the cost of increasing TV, \ an d the total num- 
ber of iterations, to produce results of the desired accuracy. 
On top of this, the ansatz presents a complicated optimiza- 
tion landscape and one must be careful not to be stuck in local 
minima. 

There is much scope to improve on the above optimiza- 
tion scheme by using more sophisticated approaches. Most 
obviously, the step-size /j, t and number of samples TV t per- 
formed in each iteration could be adjusted as the simulation 
progresses. For example by choosing the step size to decrease 
as m oc l/i Q , with TV fixed and < a < 1 we are guaran- 
teed convergence to some local minimum 18 . Equivalently, the 
noise could be reduced at each step by increasing TV oc t' 3 , or 
some combination of both. 

In Ref. H it was found that using just the sign of the deriva- 
tive, as well as properties resulting from translational in- 
variance, was sufficient for optimizing a periodic MPS with 
Monte Carlo sampling. Other approaches existing in the lit- 
erature may result is significant gains, though it should be 
noted that approaches requiring the second derivative or Hes- 
sian matrix would increase the order of the numerical cost as a 
function of x, and would have to be made robust to statistical 
noise. 



V. FUTURE APPLICATIONS 

There are several situations where it is most natural to use 
Monte Carlo sampling to speed-up MERA algorithms. Let us 
briefly review them. 
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FIG. 1 1 : (Color online) (a) The ternary MERA structure has isome- 
tries transforming 3 sites into 1. (b) This two-layer binary MERA 
structure has added disentanglers which may help represent long 
range entanglement, (c) Sampling scheme for the lowest layer of 
the 3x3-to-l MERA described in Ref.B 



A. Different MERA structures 

In this work we have considered in detail the binary MERA 
for a ID lattice with translation invariance. However, even 
in a translationally invariant, ID lattice, one has freedom to 
choose between a large variety of entanglement renormaliza- 
tion schemes, leading to MERA structures with different con- 
figurations of isometries and disentanglers. 

The ternary MERA, shown in Fig. [TTJ (a), has a narrower 
causal cone, with a width of just two sites, and the traditional 
algorithms for optimizing it have a cost that scales as 0(x 8 )- 
As a result, the ternary MERA is sometimes favored over the 
binary MERA. Note that this does not necessarily translate 
into an improved accuracy in expectation values — the ternary 
MERA is in general less accurate than binary MERA for the 
same value of \- 

It is interesting to note that, in the ternary MERA, the per- 
fect sampling algorithm presented here again has cost 0(\ 5 ), 
while Markov chain Monte Carlo, as well as expectation value 
and environment estimation, is possible with cost 0(x 4 )- It 
is unclear whether after including autocorrelation effects the 
Markov chain method performs better, similar, or worse over- 
all compared to the perfect sampling algorithm. 

Another possible ID MERA includes two layers of disen- 
tanglers to account for entanglement over larger distances, as 
depicted in Fig.QT|(b). This MERA has a causal cone that is 
five sites wide, and traditional algorithms would have numeri- 
cal cost 0(x 12 ) an d require memory 0(x 10 ) — limiting x to 
rather small values. However, a sampling technique will only 
require 0(x 7 ) time per sample and 0(x > ) memory overall — 
a huge saving. Note that the power roughly halves when we 
change from an exact contraction which effectively rescales 
density matrices from higher layers to lower, to a Monte Carlo 
scheme which samples wavefunctions from this distribution. 

The scaling of computational cost in x m 2D lattices is even 
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FIG. 12: (Color online) Causal cone of two operators separated by 
17 sites in a 4 layer binary MERA. Monte Carlo sampling can be 
performed on the red points, with a cost 0(x 8 )- An exact contraction 
of the expectation value costs 0(x 12 )- 



more challenging, mostly because the width of the causal cone 
(or number of indices included in a horizontal section of the 
causal cone) is much larger. Once again, sampling wavefunc- 
tions will require roughly square-root the number of opera- 
tions (and memory) needed to calculate the exact reduced- 
density matrix. For instance, in the 3x3-to-l MERA pre- 
sented in Ref. U, the cost of an exact contraction scales as 
0(% 16 logi), while with Monte Carlo sampling it is possi- 
ble with just 0(\ s log'L) operations per sample (depicted in 
Fig-Q~]]( c ))- Memory might be a limiting factor in 2D MERA 
algorithms, while the temporary memory required for this al- 
gorithm is less than that to store the disentanglers and isome- 
tries. 



B. Long-range correlations 

Another challenge with MERA calculations is the numer- 
ical cost of long-range correlations. Take for instance the 
two-site operator (AiBj), for arbitrary sites i and j. The cost 
required to contract the corresponding tensor network within 
the binary MERA scheme can scale as much as 0(\ 12 ) — 
significantly more than the C(x 9 ) c °st for neighbor and next- 
nearest-neighbor correlations. 

However, an estimate of the correlator can be obtained us- 
ing Monte Carlo sampling at a reduced cost (per sample). In 
Fig- ESI we depict the causal cone structure of two single-site 
operators separated by i — j = 17 sites in a binary MERA. 
The cost of Monte Carlo sampling for (AiBj) is just 0(x 8 )- 

This technique can be extended to 2D lattice systems, 
where calculating long-range correlations exactly quickly be- 
comes infeasible, even for modest values of x- Note again 
that memory constraints are particularly challenging for 2D 
MERA calculations and Monte Carlo sampling can alleviate 
this burden. 



VI. CONCLUSION AND OUTLOOK 

We have outlined and tested a scheme for Monte Carlo sam- 
pling with the MERA. Uncorrelated samples can be efficiently 
generated directly from the wavefunction overlap probability 
distribution, without needing to resort to Markov chain Monte 
Carlo methods. From this, expectation values can be extracted 
and we have demonstrated techniques to reduce the statisti- 
cal error. We have also presented and demonstrated an algo- 
rithm to optimize MERA wavefunctions using sampled en- 
ergy derivatives. 

The numerical results presented here were not intended to 
be state-of-the-art solutions of the ID quantum Ising model, 
but rather to demonstrate feasibility and motivate subsequent 
applications to 2D systems. In general, Monte Carlo sampling 
becomes more advantageous for systems with large numbers 
of degrees of freedom, and we expect 2D MERA to be no ex- 
ception. Because the reduction in cost in two (and higher) di- 
mensions is so significant, Monte Carlo techniques are a very 
attractive way to achieve reasonable values of x with current 
computers. 

Obvious improvements to the code include utilizing sym- 
metries and parallelization to supercomputers, which is 
straightfoward with our perfect sampling algorithm. Further 
research into optimization strategies may lead to other im- 
provements (e.g. by reducing the number of iterations or the 
tendency to find local minima). Re weighting techniques^ may 
make the optimization more efficient when approaching the 
ground state. 
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Appendix 

In this appendix we explain how to compute, with cost 
0(n 2 m), the n x m isometric matrix 

F = Aexp(A f B - B f A), (22) 

where A is an n x m isometric matrix, B is a general m x n 
matrix, and n < m. The naive approach would be to evaluate 
the m x m matrix A^B — B^A and compute its exponential, 
with cost 0(m 3 ), before multiplying by A. However, noting 
that the exponent does not have full rank (the rank is at most 
2n), we can hope to find a faster method. 
Taking the Taylor expansion 

F = A + B- AB^A + ^ (BA^B - BB^A 

- AB^B + AB^AB^A) + (23) 
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we observe that the result can be achieved with a series of 
multiplications between n X n matrices C = AB\ and 
D = BB\ post-multiplied by either A or B, requiring total 
cost 0(n 2 m). 

The following algorithm calculates the Taylor expansion to 
order p. 

C <- AB^; D <- BB^ 

W <- I; X <- 0;Y <- I; Z <- 

for i = 1 — > p do 

T < (W + XD)/i 

X <- (W + XC)/i 
W 

Y<-Y + W 



Z <- Z + X 

end for 

F <— Y A + ZB 

In the binary MERA, where isometries are x x X 2 matrices 
(i.e. n = x, m = x 2 ), the cost of this algorithm scales as 
0(% 4 ), compared with the cost 0(\ 6 ) of the naive approach. 
This algorithm becomes particularly important for a tree ten- 
sor network and for the MERA in two dimensions, where the 
naive approach becomes more expensive, in powers of \, than 
a sampling (thus becoming the bottle neck of an optimiza- 
tion based on sampling), whereas the above algorithm remains 
competitive. 
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Notice that the second term in the brackets of Eq. l |15t arises 
from the change in the normalization of the wave function. Al- 
though we are dealing with unitary/isometric tensors which ensure 
(* c |* c ) = 1, small chan ges in arbitrary directions may break 
the unitarity and modify the norm. Interestingly, even when pro- 
jecting into the unitary tangent space (see below) where this term 
averages to zero, its inclusion is important to reduce the sampling 
error — sometimes by several orders of magnitude. 
There may be very small contributions to the entropy resulting 
from the anisotropy of the MERA. Also note that H and 
share a Z2 symmetry which is broken by the MERA wavefunc- 
tion for h < 1, and the exact ground state should have S 1 = ln2 
entanglement entropy as h — s> 0. 



